Goto

Collaborating Authors

 support selection




Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning

Bouadi, Mohamed, Seth, Pratinav, Tanna, Aditya, Sankarapu, Vinay Kumar

arXiv.org Machine Learning

Tabular data drive most real-world machine learning applications, yet building general-purpose models for them remains difficult. Mixed numeric and categorical fields, weak feature structure, and limited labeled data make scaling and generalization challenging. To this end, we introduce Orion-Bix, a tabular foundation model that combines biaxial attention with meta-learned in-context reasoning for few-shot tabular learning. Its encoder alternates standard, grouped, hierarchical, and relational attention, fusing their outputs through multi-CLS summarization to capture both local and global dependencies efficiently. A label-aware ICL head adapts on the fly and scales to large label spaces via hierarchical decision routing. Meta-trained on synthetically generated, structurally diverse tables with causal priors, Orion-Bix learns transferable inductive biases across heterogeneous data. Delivered as a scikit-learn compatible foundation model, it outperforms gradient-boosting baselines and remains competitive with state-of-the-art tabular foundation models on public benchmarks, showing that biaxial attention with episodic meta-training enables robust, few-shot-ready tabular learning. The model is publicly available at https://github.com/Lexsi-Labs/Orion-BiX .



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Overview: The paper proposes a framework for enforcing structure in Bayesian models via structured prior selection based on the maximum entropy principle. Although the optimal prior may not be tractable, the authors developed an approximation method using submodule optimization. Contructing priors with structured variables is an important topic, so this method should be able to make good impact. Quality The paper is technically sound.



Learned ISTA with Error-based Thresholding for Adaptive Sparse Coding

Li, Ziang, Wu, Kailun, Guo, Yiwen, Zhang, Changshui

arXiv.org Artificial Intelligence

Also, it leads to poor generalization to which utilizes a function of the layer-wise reconstruction error test data with a different distribution (or sparsity) from the to suggest a specific threshold for each observation in the training data. To address the above issues, we propose an shrinkage function of each layer. We show that the proposed error-based thresholding (EBT) mechanism of LISTA-based EBT mechanism well disentangles the learnable parameters models to improve their adaptivity. EBT introduces a function in the shrinkage functions from the reconstruction errors, endowing of the evolving estimation error to provide each threshold the obtained models with improved adaptivity to possible in the shrinkage functions in the model. It has no extra learnable data variations. With rigorous analyses, we further show parameter compared with original LISTA-based models, that the proposed EBT also leads to a faster convergence on yet shows significantly better performance.


Hyperparameter Tuning is All You Need for LISTA

Chen, Xiaohan, Liu, Jialin, Wang, Zhangyang, Yin, Wotao

arXiv.org Machine Learning

Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. It has had great success on sparse recovery. In this paper, we show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate and, in particular, the network with instance-optimal parameters is superlinearly convergent. Moreover, our new theoretical results lead to a practical approach of automatically and adaptively calculating the parameters of a LISTA network layer based on its previous layers. Perhaps most surprisingly, such an adaptive-parameter procedure reduces the training of LISTA to tuning only three hyperparameters from data: a new record set in the context of the recent advances on trimming down LISTA complexity. We call this new ultra-light weight network HyperLISTA. Compared to state-of-the-art LISTA models, HyperLISTA achieves almost the same performance on seen data distributions and performs better when tested on unseen distributions (specifically, those with different sparsity levels and nonzero magnitudes).


Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds

Chen, Xiaohan, Liu, Jialin, Wang, Zhangyang, Yin, Wotao

Neural Information Processing Systems

In recent years, unfolding iterative algorithms as neural networks has become an empirical success in solving sparse recovery problems. However, its theoretical understanding is still immature, which prevents us from fully utilizing the power of neural networks. In this work, we study unfolded ISTA (Iterative Shrinkage Thresholding Algorithm) for sparse signal recovery. We introduce a weight structure that is necessary for asymptotic convergence to the true sparse signal. With this structure, unfolded ISTA can attain a linear convergence, which is better than the sublinear convergence of ISTA/FISTA in general cases. Furthermore, we propose to incorporate thresholding in the network to perform support selection, which is easy to implement and able to boost the convergence rate both theoretically and empirically. Extensive simulations, including sparse vector recovery and a compressive sensing experiment on real image data, corroborate our theoretical results and demonstrate their practical usefulness. We have made our codes publicly available.


Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds

Chen, Xiaohan, Liu, Jialin, Wang, Zhangyang, Yin, Wotao

Neural Information Processing Systems

In recent years, unfolding iterative algorithms as neural networks has become an empirical success in solving sparse recovery problems. However, its theoretical understanding is still immature, which prevents us from fully utilizing the power of neural networks. In this work, we study unfolded ISTA (Iterative Shrinkage Thresholding Algorithm) for sparse signal recovery. We introduce a weight structure thatis necessary for asymptotic convergence to the true sparse signal. With this structure, unfolded ISTA can attain a linear convergence, which is better than the sublinear convergence of ISTA/FISTA in general cases. Furthermore, we propose to incorporate thresholding in the network to perform support selection, which is easy to implement and able to boost the convergence rate both theoretically and empirically. Extensive simulations, including sparse vector recovery and a compressive sensing experiment on real image data, corroborate our theoretical results and demonstrate their practical usefulness. We have made our codes publicly available.